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Abstract —This paper presents the first network-coded mul¬ 
tiple access (NCMA) system prototype operated on high-order 
modulations up to 16-QAM. NCMA jointly exploits physical- 
layer network coding (PNC) and multiuser decoding (MUD) 
to boost throughput of multipacket reception systems. Direct 
generalization of the existing NCMA decoding algorithm, orig¬ 
inally designed for BPSK, to high-order modulations, will lead 
to huge performance degradation. The throughput degradation 
is caused by the relative phase offset between received signals 
from different nodes. To circumvent the phase offset problem, 
this paper investigates an NCMA system with multiple receive 
antennas at the access point (AP), referred to as MIMO- 
NCMA, We put forth a low-complexity symbol-level NCMA 
decoder that, together with MIMO, can substantially alleviate 
the performance degradation induced by relative phase offset. 
To demonstrate the feasibility and advantage of MIMO-NCMA 
for high-order modulations, we implemented our designs on 
software-defined radio. Our experimental results show that the 
throughput of QPSK MIMO-NCMA is double that of both 
BPSK NCMA and QPSK MUD at SNR=10dB. For higher SNRs 
at which 16-QAM can be supported, the throughput of MIMO- 
NCMA can be as high as 3.5 times that of BPSK NCMA. 
Overall, this paper provides an implementable framework for 
high-order modulated NCMA. 

Index Terms —Physical-layer network coding, multi-user de¬ 
tection, multiple access, network-coded multiple access, high- 
order modulation, implementation 


1. Introduction 

Multipacket reception is conventionally realized by mul¬ 
tiuser decoding (MUD) techniques using orthogonal signal¬ 
ing 0 (e.g., TDMA, CDMA and OFDMA). MUD is now 
evolving from orthogonal (or semi-orthogonal) signaling, 
toward non-orthogonal signaling, namely, Non-orthogonal 
Multiple Access (NOMA) 0-0 NOMA aims to better uti¬ 
lize the frequency bands by allowing more users to transmit 
together in the same frequency at the same time. This paper 
studies a new NOMA architecture named Network-Coded 
Multiple Access (NCMA). 

The key idea of NCMA is to combine physical-layer 
network coding (PNC) and MUD to enable multipacket 
reception. PNC, first introduced in Q, turns mutual inter¬ 
ference between signals from simultaneous transmitters to 
useful network-coded information, thereby improving the 
throughput of wireless relay networks. Most prior PNC works 
focused on relay networks. NCMA was the first multiple 
access scheme that explored the use of PNC decoding 
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Fig. 1: System Model for NCMA. 


for non-relay networks, e.g., uplink of wireless local area 
networks (WLAN) j^. 

Fig. shows a typical WLAN setup in which two end 
nodes send messages to a common access point (AP). The 
two end nodes are allowed to send their packets simulta¬ 
neously to boost throughput. NCMA jointly exploits PNC 
and MUD through a cross-layer design involving channel 
coding/decoding at the MAC and PHY layers, as explained 
below. Each client node (e.g., nodes A and B in Fig. par¬ 
titions and encodes one large source message (e.g., messages 
M^ and M^ for nodes A and B) into multiple small packets 
at the MAC layer (see Fig. |^. At the PHY layer, additional 
channel coding is performed on each small packet before it 
is transmitted to the AP. At the AP, two PHY-layer decoders 
are used to decode useful information from the overlapped 
packets transmitted simultaneously by different client nodes: 
(i) the PNC decoder attempts to decode a network-coded 
packet (e.g., a bit-wise XOR packet A ^ B 0 ), while (ii) 
the MUD decoder attempts to decode the individual native 
packets A and B. In a two-layer decoding approach was 
proposed to make use of the PNC packet A 0 5 efficiently. 
For example, experimental results in 0 showed that at SNR 
of 8.5dB, with probability 22% the MUD decoder can decode 
only one of the packet A or B. When only one of the two 
native packets is decoded, with probability 85% percent the 
PNC decoder can decode A 0 5. In this scenario, the PHY 
layer can use A 0 5 and the available native packet to 
recover the missing native packet, A or B. Furthermore, 
showed that when neither packet A nor B is decoded, with 
probability 40% the PNC decoder can still decode A 0 5. 
Such “lone” PNC packets are not useful at the PHY layer 
for the recovery of native packets A and B. However, the 
correlations among successive PHY-layer packets introduced 
by the MAC-layer channel coding allows NCMA to make 
use of the PHY-layer network-coded packets to recover the 
two MAC-layer native messages and . With the 
two-layer channel coding, NCMA makes good use of the 
PHY-layer network-coded packets for the recovery of native 
messages at the MAC layer (see Section S for details). 

Prior works on NCMA 0 0 only explored a simple 
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Fig. 2: The general architecture of an NCMA node’s infor¬ 
mation processing at MAC and PHY layers. 
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Fig. 3: NCMA PHY-layer channel decoders. 


prototype with BPSK modulation. To increase the system 
throughput, especially in the medium and high signal-to- 
noise ratio (SNR) regimes, it is desirable to adopt high-order 
modulations. This paper is an attempt to fill the gap. 

It turns out that there are a number of subtleties when 
applying high-order modulations in NCMA. As will be 
shown in Section [Tvl direct generalization of the scheme 


in |[^, Q from BPSK to QPSK leads to very low system 
throughput. Rather than a boost, at 8.5dB SNR, the PNC 
decoder can only successfully decode A^B with probability 
of 5%. The system throughput drops by round 80%, when 
we move from BPSK to QPSK directly (see Fig. [T^ and 
elaboration in Section [Vn| ). 

This drastic throughput degradation is caused by the 
relative phase offset between the signals of simultaneously 
transmitted packets. Here, we remark that certain advanced 
channel decoding methods expounded in the literature 

may partially solve the phase offset problem, but these 
are generally complex iterative decoding methods that induce 
large decoding latency, and therefore are not amenable to 
practical implementation. 

A goal of ours is to build a high-order modulated NCMA 
system that can operate in real time. Thus, in this paper, 
we focus on non-iterative Viterbi decoder, assuming the use 
of convolutional codes. To address the relative phase offset 
issue, we explore the use of two antennas at the AP — the 
client nodes still have one antenna each — together with 
''symbol-lever decoding We refer to our system as MIMO- 
NCMA. 


The main results of our investigations can be summarized 
as follows: 

(1) For MIMO-NCMA, we first study a low-complexity 
demodulation scheme that is compatible with the standard 
point-to-point Viterbi decoder that admits the soft informa¬ 
tion of individual bits as inputs. A high-order modulated 
symbol can be broken into multiple bits as the Viterbi 
decoder’s inputs. We refer to such a decoder as the "hit-lever 
NCMA decoder. With two antennas at the AP and the bit- 
level decoding, the system throughput of QPSK-modulated 


Tt is worth pointing out that, although our setup shares the same hardware 
setting with distributed MIMO, a typical MIMO system only incorporates 
MUD but not PNC decoding at the AP; furthermore, most distributed MIMO 
systems require transmitter precoding, while our client nodes do not perform 
transmitter precoding to maintain low complexity of the overall system. 


NCMA can be improved substantially. 

(2) Moving from QPSK to 16-QAM (or higher-order mod¬ 
ulations), bit-level decoding may lead to severe performance 
degradation. This is because in NCMA, unlike point-to-point 
systems, the two users’ constellation points (each containing 
4 bits under Gray mapping) are correlated within an over¬ 
lapped symbol, but the bit-level decoders treat the 4 bits as 
independent information to match the standard point-to-point 
Viterbi decoder. To avoid information processing loss in the 
demodulator, this paper puts forth an enhanced PHY-layer 
decoder for 16-QAM (extendable to higher modulations), 
referred to as the "symbol-lever decoder. The symbol-level 
NCMA decoder contains PNC and MUD demodulators that 
can retain the information on inter-correlations among the 
bits inside a symbol. To accommodate the demodulated 
symbols, rather than bits, a symbol-level Viterbi decoder is 
applied to accept symbol log-likelihoods. We further show, 
that our proposed 16-QAM symbol-level Viterbi decoder has 
exactly the same decoding complexity and the same order of 
processing time as its bit-level counterpart. 

For performance evaluation, we implemented the bit- 
level and symbol-level PHY-layer decoders on software- 
defined radio. Our experiments show that, at lOdB SNR, the 
throughput of QPSK MIMO-NCMA is higher than those of 
conventional MIMO MUD systems operated with distributed 
ZF and MMSE decoders by 100% and 80%, respectively. 
More importantly, our experimental results show that, at 
SNR=10dB, the throughput of our QPSK MIMO-NCMA is 
double that of both the prior BPSK NCMA. At SNR=20dB 
when 16-QAM can be supported, the throughput of MIMO- 
NCMA can be as high as 3.5 times that of the prior BPSK 
NCMA. Overall, we provide an implementable framework 
for high-order modulated NCMA. 


The remainder of this paper is organized as follows: Sec¬ 
tion |I^ overviews prior related work. Section [ni| describes the 
key idea of the NCMA system. Section [rv| studies the penalty 
induced by relative phase offsets in high-order modulations. 
Following that. Section |V| puts forth our solutions. Section 


VI introduces enhanced PHY-layer decoders. Section |VII| 


presents the implementation details of our approach and 
the associated experimental results. Finally, Section |VIII 
concludes this paper. 
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TABLE I: Related work on iterative and non-iterative decoding schemes for NCMA systems with different modulations (we 
focus on PNC decoding here, and more MUD schemes are referred to |1|). 


—Methods 

Modulations 

Non-iterative 

Iterative 

BPSK 


(61, (7J, (1^11 

(D-BIEI 

High-order Modulations 
(QPSK and beyond) 


ITnon-channel-coded) 

Our Work: JoinTusew 

' MIMO and symbol-level decoding 


11. Related Work 

Physical-layer Network Coding: Ref. Q first proposed 
PNC to increase the throughput of a two-way relay network 
(TWRN). In TWRN, two end nodes exchange information 
via a relay. PNC doubles the throughput of a TWRN operated 
with the traditional scheduling scheme Q. PNC has been 
studied and evaluated in depth: we refer the interested readers 
to ©, (H) GD-GII and the references therein for details. 
Following the tradition of Q, prior PNC work focused 
almost exclusively on relay networks. By contrast, NCMA 
was the first attempt to apply PNC in non-relay networks 
(i.e., multiple access in wireless networks) © 0 There has 
been existing work focusing on high-order modulated PNC 
GD-GD-GB- This work, however, was theoretical in nature 
and assumed perfect synchronization (e.g., no phase offset 
between the concurrent transmitting signals). The practical 
implementation of such synchronized systems is much more 
challenging than the current system studied in our paper. 

The phase offset issue in high-order modulated PNC has 
also been studied and partially solved through different 
decoding schemes, as shown in Table |I] Unlike the existing 
work in Table |T| the solution of using multiple antennas and 
symbol-level decoding in our paper provides a non-iterative 
decoding scheme that is simple to implement. 

Coding for Multiple Access Channels: Besides NCMA 
©, 0, there have been other efforts to apply network coding 
in multiple access networks. The major difference between 
NCMA and this work is that NCMA tries to decode more 
than one equation per reception (e.g., tries to decode both 
the PNC packet and the individual native packets) from 
one overlapped packet, while most other work targets to 
get one equation per reception (either the PNC packet or 
one native packet). For example, pO|-p^ explored forming 
linear equations from the collided packets to derive source 
packets. Refs. 1^ , | [24| treated the collisions in multiple 
access channel as erasure-correcting codes on graphs. But 
ED | [23| , p4| only form one equation for each overlapped 
packet, whereas NCMA can form one or two equations 
for each overlapped packet depending on the instantaneous 
channel condition. Furthermore, the decoding in pQ| , 
is based on PHY-layer equations only, while NCMA makes 
use of another MAC-layer channel coding based on the PHY- 
layer PNC packets. 

Decoding source packets from concurrent transmissions, 
e.g., Non-orthogonal Multiple Access (NOMA), were also 
studied in 0-0 1 ^ , 1 ^ , where successive interference 
cancellation (SIC) or other MUD schemes were adopted 
for multipacket reception. Again, none of them considered 
PNC decoding. A general comparison between NOMA and 


traditional TDMA, FDMA, CDMA schemes can be found in 

© 

Distributed MIMO: As with distributed MIMO systems, 
in MIMO-NCMA, the AP also has multiple antennas. Dis¬ 
tributed MIMO can enable spatially separated transmitters 
to form a virtual MIMO system for multiple access. In the 
literature, ED -p9| studied distributed MIMO systems to 
increase the system throughput, and we refer to them as 
MIMO-MUD since they only focus on MUD decoding, with¬ 
out incorporating PNC decoding. In this paper, we consider 
classical MIMO-MUD with zero-forcing (ZF), minimum 
mean square error (MMSE) decoders as our benchmarks. 
More sophisticated distributed MIMO decoders can be found 
in 1^. 

Symbol-level Decoding in Point-to-point Systems: The 
idea of symbol-level decoding can also be used in conven¬ 
tional point-to-point systems. However, unlike in NCMA, 
the motivation is lacking there because bit-level decoding 
already yields performance that approaches the limit of 
the Shannon capacity. For example, the Gray-mapped bit- 
interleaved coded modulation (BICM) point-to-point systems 
(i.e., a bit-level decoding) have previously been shown to 
be capacity optimal even without iterative decoding, if the 
bit positions in the symbol are independent | [3T| , | [32| . In 
NCMA, however, the symbol-level decoding can lead to large 
decoding improvement over the bit-level counterpart because 
of the correlations among bits within the overlapped symbol. 
The detailed comparisons between NCMA symbol-level and 
bit-level decodings will be presented in Section |V-C| and 
|VII-B2[ including the simulation and experimental results, 
respectively. 


HI. NCMA Overview 
A. General System Model for NCMA 

We study a multiple access system where two end nodes, 
A and B, transmit information to an access point (AP) 
simultaneously, as shown in Fig. We consider the use 
of both physical-layer network coding (PNC) and multiuser 
decoding (MUD) to boost system throughput. This system 
is referred to as a network-coded multiple access (NCMA) 
system Q. 

NCMA includes both MAC layer and PHY layer oper¬ 
ations. With respect to Fig. at the MAC layer, a large 
message of node A is divided and encoded into multiple 
packets, = 1,2,... Similarly, a large message of 

node B is encoded into multiple packets, Cf = 1, 2,... We 
assume the use of the Reed-Solomon (RS) code when coding 
a large message into multiple packets. At the PHY layer, 
each packet (or Cf) is further channel-encoded into V/^ 
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Fig. 4: An example of PHY-layer packet reception 
patterns for concurrently transmitted packets using PNC 
and MUD decoders. 0 means the corresponding packets 
cannot be decoded. 
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Fig. 5: NCMA MAC-layer decoding and bridging example, 
using L=3 RS code: (a) The decoding outcomes after PHY- 
layer bridging; (b) MAC-layer RS decoding and bridging using 
lone XOR packets. 


(y^) for reliable transmission. We adopt the convolutional 
code as the PHY-layer channel codes. NCMA is a time- 
slotted system. That is, each end node j transmits packets 
Vl ^2 ^ ...y- ^to the AP in successive time slots, and the 
two end nodes’ packets (i.e., and V^) are configured to 
transmit simultaneously in the same time slot i. 

In the uplink transmission of NCMA, at the PHY layer, 
as shown in Fig. the AP first detects how many nodes 
are transmitting. When only one node is transmitting, the 
Single-User (SU) decoder will be used. When two nodes 
are transmitting simultaneously, the AP receives signals 
containing two superimposed packets. In this case, the AP 
decodes using two decoders: the MUD decoder and the PNC 
decoder. The MUD decoder attempts to decode both packets 
Cf" and explicitly, and the PNC decoder attempts to 
decode a linear combinatioij^ of two packets Cf and , 
i.e., Cf 0 . The successfully decoded packets from the 

PHY layer in different time slots are collected and passed to 
the MAC layer for further processing. With the help of the 
MAC-layer RS code, the AP decodes the original messages 
and M^, as elaborated in Part B below. . 

B. An Example 

Different from traditional multipacket reception systems 
where only MUD was adopted (Tj, a main distinguishing 
feature of NCMA is that it combines PNC decoding with 
MUD to improve the system throughput. In particular, it is 
possible that sometimes only Cf 0 Cf can be decoded using 
PNC decoding while MUD fails to recover either Cf or Cf . 
In this subsection, we illustrate the advantages and the key 
idea of NCMA using a simple example. 

PHY-layer Bridging — Let us focus on the PHY-layer 
decoding outcomes first. In a time slot i, for the MUD 
decoder, there are four possible outcomes: (i) both Cf and 
are successfully decoded; (ii) only Cf is successfully 
decoded; (iii) only Cf is successfully decoded; (iv) neither 
nor can be decoded. For the PNC decoder, there 
are two possible outcomes: (a) C(^ 0 Cf is successfully 
decoded; (b) Cf 0 Cf cannot be decoded. As a result, 

^In this paper, we only consider the bit-wise eXclusive-OR (XOR), 0, 
operation of and 


we have 4x2 = 8 possible outcomes. For explanation 
purposes. Fig. shows an example in which the eight 
possible outcomes occur in eight successive time slots. In 
time slots 3 and 4, and (7^0(7^ (abbreviated as (7^®^), 
and 0 (abbreviated as (7^®^) are decoded, 
respectively. The “complementary” XOR packets (7^®^ and 
( 7 ^®^ can be used to recover individual missing packets 
and C^. This process, which leverages the complementary 
XOR packets, is referred to as PHY-layer bridging 0. 

MAC-layer Bridging — In Fig. PHY-layer bridging 
cannot be applied to time slot 7 because neither native packet 
nor C? is available, and only a “lone” network-coded 
packet ( 7 ^®^ (namely, PNC packet) is decoded. In NCMA, 
such lone PNC packets turn out to be useful in MAC-layer 
decoding. Fig. gives an example illustrating the main idea. 
Fig- l^a) shows the PHY-layer decoding outcomes for a 
number of successive time slots. We assume the AP has 
recovered enough native packets Cf to decode with 
the help of the MAC-layer RS code by time slot 5 — in this 
example, L = 3 PHY-layer packets are needed to recover 
M^. With MAC-layer decoding, native packets (7^ and 
can also be decoded (conceptually, we could obtain (7^ and 
based on re-encoding the recovered at the MAC- 
layer, although in practice, a simpler process is possible). 
Note that the PHY layer failed to obtain ( 7 ^ in time slot 2, 
but the MAC layer recovers it in time slot 5. With (7^, the 
original lone PNC packet ( 7 ^ 0 ( 7 ^ in time slot 2 becomes a 
complementary XOR packet. Consequently, we can recover 
C 2 (using C 2 and (7^ 0 (7^), and therefore node B now 
has enough native packets (i.e., L = 3) to recover message, 
as shown in Fig. [^b). We refer this process as MAC-layer 
bridging 0, Q. 

IV. Single Antenna System 

Despite the throughput improvement brought about by 
PHY-layer bridging and MAC-layer bridging, the previous 
NCMA systems 0>0 operate with BPSK modulation only. 
This upper bounds the throughput to two decoded bits per 
channel use (one decoded bit from one user). At medium 
to high signal-to-noise ratio (SNR), e.g., SNR>10dB, it is 
desirable to use high-order modulations to further boost the 
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by 

XA[k] = (1 - 2vA[2k - 1]) + (1 - 2vA[2k])j, k = l,2, 3,... 

( 1 ) 

Let us assume an OFDM system where multipath fading can 
be dealt with by the cyclic prefix (CP). The k-th received 
sample in the frequency domain at the AP can be written as 

ynlk] = hA[k]xA[k] + hB[k]xB[k] + w[k], (2) 


Fig. 6: NCMA PHY-layer PNC decoder (XOR-CD) and 
MUD decoder (MUD-CD) for simultaneous transmissions 
from two nodes. 


throughput. This paper considers high-order modulations to 
avoid the saturation of data rate 13^ . 

In both PHY and MAC layer real-time decoders have 
been evaluated on the software-defined radio platform. At 
the PHY layer, the PNC decoder is an XOR-CD decoder 
and the MUD decoder is an MUD-CD decoder. XOR-CD 
first demodulates the overlapped signals into the XOR of 
the channel-coded bit pairs of A and B. After that, Channel 
Decoding is applied to obtain the XOR of the source bit-pairs 
of A and B. MUD-CD, on the other hand, first demodulate 
the overlapped signals into separate channel-coded bits of A 
and B. After that, Channel Decoding is applied on each of the 
streams to obtain the source bits of A and B (details of XOR- 
CD and MUD-CD can be found in the Parts A and B below). 
A salient feature of these decoders is that the standard low- 
complexity point-to-point binary Viterbi channel decoder can 
be used with changes on the demodulators only, as shown 
in Fig. However, as will be elaborated, both PNC and 
MUD decoders encounter a critical “phase offset” problem 
when we move from BPSK to high-order modulations. In 
the following, we illustrate this problem in XOR-CD and 
MUD-CD assuming QPSK. 


A. Phase Penalty for PNC Decoder 

Several decoding approaches are possible for channel- 
coded PNC systems ^ . In this paper, since we aim for real¬ 
time operations rather than optimal performance, we use the 
simple XOR-CD decoder as the PNC decoder. Sophisticated 
PNC decoders with better BER performance are possible, at 
the cost of high computational complexity and large decoding 
delays, e.g., Jt-CNC (joint channel-decoding and network¬ 
coding) 0- They have been studied in the literature, and we 
refer interested readers to i fTOj for further details. 

The general architecture of XOR-CD is shown in the 
upper block of Fig. § Let = {vA[l],...,VA[n],...) 
denote the PHY-layer codeword of node A in one time slot 
(i.e., one binary encoded packet), where vaIti] G {0,1} 
is the n-th convolutional encoded bit (similarly, we have 
= (ub[ 1], •••, ...) for node B). Assuming QPSK 

modulation, the k-th modulated symbol xa [k] for the PHY- 
layer transmitted packet = (^^[1], ...,XA[k],...) is given 


where w[k] is the noise term, and hA[k], hB[k] are the 
channel gains of the k-th samples of nodes A and B, respec¬ 
tively. In XOR-CD, the received samples {yR[k]}f^^i 2 3 
are first passed through the PNC demodulator to obtain the 
XOR bits {va[ti] Q VB[Ti]}n=i, 2 ,...’ Note that the outputs 
{vA[n\ 0 VB[n]}n=i, 2 ,... from the PNC demodulator can be 
hard or soft bits (i.e., log likelihood ratio of the bits). The 
decoders implemented by us in this work, and the BER 
performance results therein, assume the use of soft bits. 
These bits are then fed to a standard binary Viterbi decoder 
(as used in a point-to-point system) to decode the network- 
coded packet cf 0 Cf . The standard Viterbi decoder can 
be used for PNC decoding because XOR-CD exploits the 
linearity of linear channel codes, such as convolutional 
codes. Specifically, define n(-) as the convolutional channel 
encoder. Since n(-) is linear, we have U^0U^ = H {C^) 0 

n(c^) =n(c^0C^). 

With respect to node A, the odd (even) bits of are 
mapped to the in-phase (quadrature) part of xa [k] in QPSK, 
i.e., x^aI^] = 1 - ^VAi^k - 1] (x^[k] = 1 - 2vA[2k]). A 
particular pair of symbols from the two nodes is expressed 
as {xA[k], XB[k]) = {x^A[f^] +x^[k]j, + x^[k]j). 

An important issue in PNC is how to calculate XA[k] 0 
^^[k] (abbreviated as XA^B[k]) using the received sample 
yR[k] in §. To maintain the linear property of convolutional 
codes, in XOR-CD we need to map the in-phase part of 
XA[k] with the in-phase part of XB[k], i.e., xf[k] with x^^[k] 
(similarly, x^[k] with x^[k]) into the network-coded in- 
phase (quadrature) part of xa^b [k]. More precisely, the PNC 
mapping in XOR-CD for XA^B[k] is defined as 

XA®B [k] = Xa [k] 0 [k] 0 {x^ [k] 0 x% [k] ) j, (3) 


where xf[k] 0 x^^[k] = xf[k]x^^[k] and x^[k] 0 x^[k] = 
x^[k]x^[k] given that xf[k], x^[k], x^[k] G 

{1,-1}. The PNC demodulation rule for the XORed bits 
is defined as 


VA[2k — 1 ] 0 VB[2k — 1 ] 
VA[2k] 0 VB[2k] 


1 - xf[k] 0 x^^[k] 

2 

1 - x^[k] 0 x^[k] 

2 


(4) 


Let us focus on one particular received sample ynlk] = 
XA[k] 0 XB[k]e^^^, where Af is the relative phase off¬ 
set between the two nodes (to explain things in simple 
terms, here we assume perfect power control so that the 
received signal powers for both nodes are equal, and we have 
hA[k] = 1 and hB[k] = in general, this needs not be 

the case). Fig. plots the noise-free constellation map for 
A0 = 7r/2, in which some constellation points overlap with 
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(a) Single-antenna (QPSK) (b) Double-antenna (QPSK) (c) Double-antenna (16-QAM) 



Fig. 8: BER results for single-antenna and double-antenna NCMA systems with different relative phase offsets in AWGN 
channels: XOR-CD and MUD-CD decoders with (a) single antenna, (b) double antennas with QPSK, benchmarked by ZF 
and MMSE decoders, and (c) double antennas with 16-QAM. 



Fig. 7: Constellation map for the received samples at the AP 
for IHaI = Ihsl = 1 and relative phase offset Acj) = 7r/2. 
Note that we purposely set Acj) to be slightly smaller than 7r/2 
to highlight different PNC mappings using different colors, 
where the same color corresponds to the same network- 
coded symbol. The symbol pair (xa^xb) denotes the MUD 
demodulated symbols. 


the others (note: to see the overlapping constellations points 
more clearly in the figure, we purposely set A0 to be slightly 
smaller than 7r/2). In Fig.|^ constellation points of the same 
color are mapped to the same XOR value. Note, for example, 
that the constellation points of symbol pairs (1 — j, 1 — j) and 
(l+j, — 1—j) overlap, but they are mapped to different XOR 
values. In particular, when A0 = 7r/2, the XOR mapping of 
# leads to ambiguity even in the absence of noise, and the 
error probability for the network-coded symbol can be as 
high as 50%. 

From the simulation results in Fig. [^a), we can see that 
the BER performance of XOR-CD with A0 = 7r/2 degrades 
greatly compared with that of Acj) = 7r/4. In general, we 
find that the BER performance of XOR-CD depends much 
on the relative phase offset Acj). As will be discussed in the 
next subsection, the BER performance of MUD-CD is also 
highly correlated with the relative phase offset Acj). 


B. Phase Penalty for MUD Decoder 

The architecture of MUD-CD is shown in the lower 
block of Fig. The goal of the MUD-CD decoder is 
to decode the two source packets Cf and Cf separately. 
In this process, the received samples 2 3 

first passed through a MUD demodulator to get the binary 
channel-encoded bits {r’AM}n=i, 2 ,... and 
which are then fed into two binary Viterbi decoders to recover 
the packets Cf and Cf of nodes A and B, respectively. 

Phase penalty similar to that in XOR-CD also exists in 
MUD-CD, as shown in Fig.j^a). Let us use the constellation 
map in Fig.[7]to explain the phase penalty problem for MUD- 
CD. With respect to a particular constellation point “2”, we 
cannot distinguish between the symbol pair (1 — j, 1 — j) and 
symbol pair (1 + j, — 1 — j), based on the received sample 
yR[k] when Af = 7r/2. We find that both PNC and MUD 
decoders’ BER performances degrade drastically when Af = 
7r/2 (see Fig.^a)). 

From the BER curves in Fig. [^a), we can see that the 
BER performance of XOR-CD is also related to the relative 
phase offset Af between two nodes. We find that both PNC 
and MUD decoders’ BER performances degrade drastically 
when A(j) = 7r/2. In general, when Af is in the range of 
7r/4 < Af < 37r/4 or 57r/4 < Af < It:/A, XOR-CD has 
poor performance; when Af = m7r/2,m = 0,1,2,..., the 
overlapping constellation points degrade the performance of 
MUD-CD decoder. This is a hurdle for NCMA systems when 
QPSK is adopted. This hurdle can be overcome with the use 


of multiple antennas at the AP, as explained in Section IV-C 


C. Possible Solution to Alleviate Phase Penalty 

The fundamental reason why the BER performance of 
NCMA is bad when Af = 7r/2 is that some overlapping 
constellation points are being mapped to different network- 
coded symbols in the case of PNC decoding, and can be 
demodulated into two different symbol pairs in the case 
of MUD (i.e., there is an ambiguity even in the absence 
of noise). In the literature, several approaches have been 
proposed to partially solve the phase penalty problem for 
PNC systems. For example, |[8| proposed a real-to-imaginary 
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PNC mapping for xa®b (i-e., ® I'^ther 

than using §) when the phase offset is 7r/4 < A0 < 
37r/4 when the data are not channel-coded. However, this 
method is not applicable to channel-coded PNC systems 
because the linearity of channel codes will be destroyed 
by such PNC mapping (i.e., the PNC-mapped symbols do 
not constitute a valid codeword anymore). Another possible 
method was put forth by 0. in which a QPSK to 5- 
QAM demodulation was adopted. This method, however, 
is also for non-channel-coded systems and it cannot be 
easily extended to channel-coded XOR-CD systems. Refs. 
0 studied a high computational-complexity Jt-CNC (joint 
channel-decoding and network-coding) approach that makes 
real-time processing difficult. 

Nowadays, many APs are equipped with multiple antennas 
p4| . In this paper, we consider an NCMA system in which 
the AP has two antennas, referred to as MIMO-NCMA. We 
show via simulations and experiments that MIMO-NCMA 
can solve the phase penalty problem while maintaining the 
low complexity of non-iterative PHY-layer decoders. 

In Section |Vj we show that MIMO techniques can solve 
the phase penalty issue in QPSK-modulated NCMA systems, 
using bit-level decoding as in point-to-point systems. 

In Section |Vl| we show that just using the MIMO and 
bit-level decoding techniques may not be good enough 
when higher-order modulations beyond QPSK are adopted, 
e.g., 16-QAM. Fortunately, replacing bit-level decoding with 
symbol-level decoding greatly improves the performance. 


V. MIMO-NCMA: QPSK with Bit-level Decoding 

This section presents bit-level PHY-layer decoders for 
MIMO-NCMA with QPSK modulations: Section |V-A| fo¬ 
cuses on the design of the XOR-CD decoder, and Section 


V-B the MUD-CD decoder. For the PHY-layer channel 


codes, our system adopts the same [133,171]8 convolutional 
code as in the 802.11 standard p4| . We present a low- 
complexity demodulation scheme that is compatible with the 
standard point-to-point bit-level Viterbi decoder that admits 
individual bits’ information as inputs, e.g., a QPSK symbol 
is broken into two bits as the Viterbi decoder’s inputs. 


A. PNC Decoder (XOR-CD) 

Let the received samples on the two antennas at the AP be 
{yRi[f^]}k=i, 2 , 3 ,... and {yR 2 [k]}k=i, 2 , 3 ,...^ respectively. Our 
target is to compute the log-likelihood ratios (LLR) of two 
bits VA[‘^k — 1] 0 7;b[2/c — 1] and VAl^^k] 0 vsl^^k] based 
on the k-th received samples ym [k] and yR 2 [k] of antennas 
1 and 2. The PNC demodulator’s outputs (namely, the soft 
information of 0 i^^[n]}^=i^ 2 ,...) are fed into the 

Viterbi decoder. This subsection derives the soft information 
of {Vji^[n] 0 We assume the end nodes use 

QPSK modulation and express the transmitted symbols as 
xa = and xb = + x^j. The received 

frequency-domain samples (our NCMA system is an OFDM 
system) 0 on the two antennas of the AP are 

ym = hAlXA + hBlXB + Wi, 

yR2 = hA2XA + hB2XB + W 2 , (5) 


where Hai and Hbi (hA 2 and hB 2 ) are the uplink channel 
gains of nodes A and B associated with the first (second) 
antenna, respectively, and wi, 1 x 2 are additive white Gaussian 
noises (AWGN) with variances af and cr^. 

We next consider how to reduce the 16 constellation points 
of an overlapped QPSK joint symbol to the log-likelihood ra¬ 
tios of two binary XOR bits. Doing so reduces the complexity 
of the decoder design. We consider the in-phase XOR bit 
here; a similar procedure applies to the quadrature XOR bit. 
Define the in-phase component’s LLR of packet As QPSK 
symbol (i.e., xa) as log(P^/(5^), where and are the 
probabilities of the in-phase component of xa being 1 and 
-1, respectively. Similarly, for LLR{x^j^ ® x^^), Pa^b 
Qa©b the probabilities corresponding to X A ® X B — 1 
and 0 = —1. We have 

LLR{x^a ® ^b) = log Pa®b - log Qa®b 

= logPr(a;^ ©a;|j = l\yRi,yR2) 

- logPr(a;^ ©a;B =(6) 


Out of the 16 constellation points associated with the 
symbol pair (xa^xb), eight correspond to x^ 0 x^ = 1 (the 
red dots in Fig. |^, and eight correspond to X 0 x^ — 1 

(the blue dots in Fig. l9|. Let denote the set of 

symbol pairs (xa,^h) mat satisfy 0 = l[^ We can 

express Pa^b 


Pa(sb = Pt{x^a®xb = MyRi^yR2) 


(X 


E 


r \yRl - hAlXA - hsiXBl^ . 
exp{- -2 -} 


ixA,XB)eXxI -1 

r \yR2 — hA2XA — hB2XB\^ a 

• exp{- -2 -}. 

a 9 


(7) 


We compute Qa©h ^ similar way based on the set 
substitute Pa^b Qa©h 

the LLR expression of .Fig. 0 plots the constellation 
maps of the two antennas with the same uplink channel- 
gain magnitude but with relative phase offsets A0i = 30° 
and A02 = 100° on antennas 1 and 2. Constellation 
points of sets y^i _i and y^i __i are marked by red 
and blue dots, respectively. In MIMO-NCMA, for further 
simplification, when computing LLR{x^j^ ^ x^^), we first 
reduce the number of constellation points from 16 to 2, i.e., 
choose only one constellation point in ^=i ^^d one in 
Xxi^^^=-i (see the red and blue arrows of Fig.j^. The two 
selected constellation points correspond to the most likely 
points representing two different XOR values of X 0 x^. 
After that, we compute the LLR based on the two selected 
constellation points (see the dashed blocks between the two 
figures). We refer to this demodulation procedure as reduced- 
constellation demodulation. 

Reduced-constellation Demodulation for Two Antennas 


^The set x^i contains eight constellation points originated from 

A0 S 

the symbol pair {xa, xb), namely, (1 + j, 1 + j), (1 + j, 1 — j)^ ( — 1 + 

i.-i+i), (-i+i,-i-i), (i-i, i+i). (i-i: i-i), (-i-i.-i+i) 

and (-1 - j,-1 - j). 
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Constellation Maps for Illustrating Reduced Constellation for PNC Decoder 



Fig. 9: Illustration of reduced-constellation scheme for PNC to obtain the soft information on 0 (a) Constellation 

map for antenna 1 with relative phase offset A0i = 30°, (b) Constellation map for antenna 2 with relatvie phase offset 
A02 = 100°. The black stars represent the received samples; red dots represent constellation points in and blue 

dots for Xx^ =-!• t)lue solid lines represent the Euclidean distances from received samples to the constellation 

points originated from the sets Xx^ =i Xx^ =-i- select the two constellation points by jointly considering two 
maps, i.e., the minimum sum of two Euclidean distances’ squares (dashed lines). In this example, (1 — j, 1 — j) is selected in 
Xxi = 1 ^ ^nd (1 — j, — 1 — j) is selected in Xx^ =-i- These two selected constellation points correspond to two different 

yl0S A0S 

values of 0 . 


We assume the noise variances and are the same, 
= ^2 = Note that, in real wireless systems, and 
(72 may not be equal; however, our derivations below can be 
easily generalized to deal with the case (7\. We adopt 
the log-max approximation, log( J]] iexp{zi)) ^ max^^ 
to simplify the LLR calculation. Eor example, log Pa^b 
be expressed as where di{yRi) = lym - HaiXa - 
hsixsl and di{yR 2 ) = \yR 2 - hA 2 XA - hB 2 XB\ are the 
Euclidean distances from ym and yR 2 to the constellation 
point {xa^xb) in Xa:^^^=i- The physical meaning of (|^ can 
be understood to be selecting one point with the minimum 
di{yRi) 0 d\{yR 2 ) value among all symbol pairs in set 
Xx^ -!• 

Similarly, define d-i{yRi) and d-i{yR 2 ) as the Euclidean 
distance from yRi and yR 2 to points in respec¬ 

tively. In Eig. 1^ (1 — jf, 1 — j) is selected among the points 
in Xx^j^^^=i^ (1 ~ ~ i) is selected among the 

points in to represent the cases of 

X A 0 X B — i- 

and X A 0 X ^ = —1, respectively. The approximation of 
LLR{x^j^ 0 is 


LLR{xa ® Xb) ~ min {d\{yRi) + d\{yR2)} 

{xA,XB)eX^I 

min {d\{yRi)+d\{yR 2 )]. (8) 

{xA,XB)eX^I =_1 


The QPSK demodulation from 0 to VA[^k — 
1] 0 VBl'^k — 1] is a one-to-one mapping (see (|^), and the 


following LLR relationship holds: 

LLR{vA[^k — 1] ^ VB[^k — 1]) = LLR{x^aW\ ® 

( 10 ) 

Similarly, even input bits’ LLR{va[^P\ 0 vb[^P\) is 
LLR{x%k]®x%[k]). 


B. MUD Decoder (MUD-CD) 


The MUD decoder for MIMO-NCMA follows the same 
reduced-constellation principle as that of the PNC decoder, 
with the difference that its target is to obtain the individ¬ 
ual soft information of {vA[n]}n=i, 2 ,... and {r’BM}n=i, 2 ,... 
rather than their XOR. Without loss of generality, let us 
focus on the derivation of the soft information of packet 
A. LLR{x^j\k\) and LLR{x^[k\) are the soft information 
for VA^k — 1] and i;^[2/c], respectively. Based on the sets of 
Xa;^=inand Xa^^=-i, and the same “log-max approximation” 
rule as m ([^, we now have the approximation of LLR{xj^): 

LLR{xa) ~ , min {dUvRi) + dl{yR 2 )} 

ixA,XB)eX^I -1 

min {d'i^^iyRi) + d'i^{yR 2 )}. (11) 

{xA,XB)eX^I __1 

Xa- J- 


Eig. 10 shows a reduced constellation example for MUD- 
CD using the same constellation map of Eig. We note 
that for the same constellation map and the same received 


^The set Xx^ =i contains eight constellation points originated from 
the symbol pair (xa^xb), namely, (1 + j, 1 + j)? (1 + jU “ j); (1 + 
J, -1 0 j). (1 0 J, -1 - j). (1 - J, 1 0 j). (1 - J, 1 - j). (1 - J, -1 0 j)- 
and (1 - j, -1 - j) 
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loggias max {-\yRl - hAlXA - hsiXs^ - \yR2 - hA2XA - hB2XBf} 

{xA,XB)ex^I 

oc min {di{yRi) + dl{yR 2 )}. 

{xa,xb)^X^I 


(9) 


Constellation Maps for Illustrating Reduced Constellation 
for MUD Decoder 

Antenna 1: A^j = 30° Antenna 2: A ^2 = 100° 



Fig. 10: Illustration of the reduced-constellation scheme for MUD to obtain soft information on Red and blue solid 
lines represent the Euclidean distances from received samples to the constellation points originated from the sets Xx^ =i 
and Xx^^=-i- this example, (1 — j, —1 — j) is selected in Xx^^=i^ ~ JA — j) is selected in Xx^^=-i- These two 

selected constellation points correspond to two different values of 


samples, the soft outputs of XOR-CD decoder and MUD-CD 
decoder are in general different. In a real wireless system, 
since the constellation map changes from one sample to 
another sample, sometimes the MUD decoder’s LLR may 
be higher, and sometimes the PNC’s LLR. NCMA can best 
utilize and extract useful information out of the received 
samples, by jointly making use of PNC and MUD decoders. 

C. Simulation Results for Bit-level Decoding 

We now take a look at the simulation results in Fig. [^b), 
where two antennas and bit-level decoding are adopted. Fig. 
[^b) plots the BER curves of QPSK PNC and MUD decoders 
with different A0i and A 02 , where A0i and A 02 are the 
relative phase offsets at antenna 1 and antenna 2, respectively. 
When /Sfi = 7r/2, A02 = 0, we can see that the BER 
performances of both XOR-CD and MUD-CD decoders are 
greatly improved compared with the single antenna case with 
a 7r/2 phase offset. The joint use of MIMO and bit-level 
decoding solves the phase penalty issue for QPSK-modulated 
NCMA systems. 

However, for higher-order modulations, e.g., 16-QAM, bit- 
level decoding still incurs severe performance degradation, 
even with the use of MIMO. Fig.j^c) plots the BER curves of 
16-QAM bit-level PNC and MUD decoders. We can see from 
Fig. [^c) that even with perfect phase synchronization (i.e., 
A01 = A02 = 0), the 16-QAM bit-level XOR-CD decoder 
reaches a BER floor. Fortunately, as will be discussed in 


the next section, using symbol-level decoding improves the 
performance greatly (also see Fig. [^c)). 

VI. MIMO-NCMA: 16-QAM with Symbol-level 
Decoding 

This section studies symbol-level decoding that does away 
bit-level decoding to avoid information loss in the demod¬ 
ulation process. To utilize the soft-symbol information, we 
investigate the combination of symbol-level PNC and MUD 
demodulators with a symbol-level Viterbi decoder, by gen¬ 
eralizing the reduced-constellation algorithm. For simplicity 
and without loss of generality, we start with QPSK to explain 
the idea of symbol-level decoding and the potential perfor¬ 
mance degradation problem in bit-level decoders (although 
the performance degradation in QPSK is small when MIMO 
is used). After that, we extend the treatment from QPSK to 
16-QAM. 

A. QPSK Symbol-level NCMA Decoder 

The symbol-level NCMA decoder contains symbol-level 
demodulators and symbol-level Viterbi decoders, as shown 
in Fig. [TT] For example, consider the PNC decoder. Based 
on the received samples yRi and yR 2 from the two antennas 
in ([^, the QPSK symbol-level PNC demodulator generalizes 
the reduced-constellation demodulation scheme by reducing 
the 16 constellation points to 4 constellation points, i.e., we 
























10 


logPr(a:^®B = 1 + t/H2) 

oc log ^ 

{x j\,x = l-\-j 


lym - hAlXA - |yR2 - hA2XA - hB2XB|^-, 

expl 2 J^^Pl 2 -1* 


( 12 ) 



C,:"©Cf 

C/ 

Cf 


Fig. 11: Symbol-level NCMA decoder flow graph, in which 
the red characters highlight the difference with respect to the 
bit-level decoder in Fig. 


reduce to a whole QPSK XOR symbol rather than two bit¬ 
wise XORs. The soft information for xa^b = 1 + j, for 
example, is expressed as where XxA®B=i+j denotes 
the set of symbol pair (xa^xb) satisfying xa^b = 1 + j- 
We further express © using the “log-max” approximation: 

logPr(xAeB = '^+j\yRl,yR2) 

oc ^ min {dl+j{yRi) + dl_^_J{yR 2 )}, (13) 

{xAiXb)^Xxj^^B=^+3 

where di^j{yRi) = {ym - Haixa - hsiXBl (di^j{yR 2 ) = 
\yR 2 — hA 2 XA — hB 2 XB\) IS the Euclidean distance between 
ym {yR 2 ) and the constellation point (xa^xb) satisfy¬ 
ing xa^b = 1 + j. We can compute logPr(xA©s = 
-1 + logPr(xA©B = I - j\yRi^yR 2 ) and 

logPr(xA©B = -1 - j\yRi,yR 2 ) in a similar manner. The 
above describes the symbol-level demodulation process to 
obtain the soft XOR information for each QPSK symbol. 
This soft information is then fed to a symbol-level Viterbi 
decoder to decoder the original XOR packet Cf 0 Cf . 


B. Performance Degradation Problem in Bit-level Decoders 

To motivate the use of the symbol-level decoder, we now 
elaborate the performance degradation problem associated 
with the bit-level PNC decoder assuming QPSK. For sim¬ 
plicity in our illustration, let us assume that the relative 
phase offsets between nodes A and B are 7r/2 for both 
antennas, and that the amplitudes of all the channel gains 
are 1. Furthermore, let us first neglect the noise and consider 
the reception of two specific noise-free constellation points 
y[i] = 2 and y[i ^ 1] = 2 0 2j at the AP. Note that given 
the same relative offset 7r/2, the received signals at the two 
antennas are the same. Thus, we consider the signal on one 
of the antennas here for our illustration. It is easy to see that 
y[i] =2 may result from two symbol pairs (1 — j, 1 — j) and 
(1 0 jf, — 1 — j) with equal probability, and will be mapped 
into 1 0 j and —l—j under XOR mapping with probability 


1/2, as follows: 


Pr(a;^®B[i] = 1 + j\y[i]) = Pr{xAe,B[i\ = -1 - j\y[i\) 
=1/2. (14) 

On the other hand, 01] can only result from (10jf, 1 — j) 
and can be mapped to a unique XOR l—j. 

By contrast, note that the bit-level PNC demodulator 
splits a QPSK symbol into an in-phase bit and a quadrature 
bit. With respect to the same sample y[i] = 2, the bit- 
level PNC demodulator computes the probability of each bit 
independently as follows, 

Pr{0®B[i] = l|2/[© = Pr(0®BW = -l|2/[© = 1/2, 

Pr(a^0BM = 1|2/M) = Pi'(0®bM = = 1/2, 

(15) 

which is equivalent to 


Pr(a;^®B[*] = 1 + j\y[i\) = Pr(a;^®BW = “1 - j'lyW) 
= Pr(a:;^®B[i] = 1 - j\y[i]) = Pr(a;^®B[i] = -1 + j\y[i]) 
=1/4. (16) 

Comparing with it is obvious that the bit- 

level demodulator introduces ambiguity, which may cause 
performance degradation in the Viterbi decoding process. 
In particular, given that the four possible QPSK XORs are 
equally likely, there is no information contained in 
(i.e., this amounts to feeding no information to the Viterbi 
decoder). 

The above explains the loss of information in the demod¬ 
ulation process. We now look at its impact on the decoding 
process. In particular, we examine the difference between 
the symbol-level and bit-level Viterbi decoder, and let us 


look at an example as shown in Fig. 12 Without loss of 
generality, let us assume that at stage / — 1 of their trellises, 
both the bit-level and symbol-level Viterbi decoders have the 
same path metric (i.e., the cumulated Euclidean distances), 
and only two states have much higher probability than other 
states so that we can look at these two states only for future 
decoding process. When y[i] = 2 is received, the symbol- 
level PNC demodulator computes the probability of a whole 
QPSK symbol as in ( p^ , which is then fed into the symbol- 
level Viterbi decoder. In Fig. 1^ a), the branches that output 
“00” and “11” are selected as survival paths from stage i — 1 
to stage i, and the number of possible states remains two. 
However, in Fig. p^b), where the trellis of the bit-level 
decoder is shown, all branches from stage / — 1 to stage i have 
equal branch metric due to ( p^ , and the number of possible 
states is double. With respect to the same point ^[i0l] which 
is mapped to the unique XOR symbol 1 — j, it helps eliminate 
one ambiguity state by tracing back, and only one possible 
state is left in Fig. [^a). However, if the bit-level decoder 
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□ States in Viterbi Decoder 

Most Likely State at Each Stage 
► Trellis Branches: Input Bit / Output Bits 
X ► Paths Eliminated 



Stage i 

(a) Symbol-level 



Stage i-1 Stage i 

(b) Bit-level 


Fig. 12: Examples of trellis diagrams for Viterbi with two consecutive constellation points, y[i] = 2 followed by + 1] = 
2 + 2ji: (a) a symbol-level decoder; (b) a bit-level decoder. 


is used in Fig. p^b), y[i ^1] cannot reduce the number of 
possible states to one at stage i + 1 because the number of 
possible states is doubled at stage i. 

C. Higher-order Modulations beyond QPSK 

In this subsection, we consider higher-order modulations 
beyond QPSK, and use 16-QAM as an example. As with 
QPSK, the four bits in an overlapped 16-QAM symbol 
are also correlated because of Gray mapping: their inter-bit 
dependency and correlation are more complicated than those 
in QPSK. Therefore, the information loss problem in 16- 
QAM bit-level decoders becomes more severe, and symbol- 
level decoding is needed. For 16-QAM, the same reduced- 
constellation techniques used in QPSK can be applied. In the 
following, we present the symbol-level PNC decoder design 
for 16-QAM as an example. 

The 16-QAM symbol-level PNC demodulator tries to 
reduce the 256 constellation points to 16 constellation points 
for demodulation. The soft symbol-level information will be 
fed into the 16-QAM Viterbi decoder to decode the XORed 
packet. The soft information of each symbol can be obtained 
using the equation in ( p^ , where we choose symbol 3 + 3j 
as an example. The soft symbol-level information will be fed 
into a 16-QAM Viterbi decoder to decode the XOR packet. 

We remark that 16-QAM’s demodulation complexity in¬ 
creases significantly compared with that of QPSK. This 
is due to the increase of the constellation size from 16 
points to 256 points (QPSK to 16-QAM), and the 16- 
QAM demodulator has to compute 256 Euclidean distances 
to select points under the reduced-constellation principle. 
Considering that the joint symbol to XOR mapping is ir¬ 
regular because different nodes have different channel gains 
(i.e., the constellation map is not in a nice symmetric form 
because receiver-side equalization of the two channel gains 
simultaneously is impossible for PNC), exhaustive search 
may be required. Fortunately, unlike for PNC, the decision 
regions for the MUD can be reshaped into a regular 16- 
QAM shape p4| using successive interference cancellation 
(SIC) and equalization. Based on this property, we put 
forth a reduced complexity 16-QAM NCMA demodulator 


in Appendix We further show that the BER performance 
of the reduced complexity MUD decoder is exactly the same 
as that of the exhaustive-search MUD decoder. After that, we 
further extend the reduced complexity principle from MUD 
to PNC in Appendix 

For the channel decoder, we remind the reader that each 
end node still adopts the IEEE 802.11 1/2 convolutional code 
p4| , and every input bit generates two output bits in one state 
transition (see I/O in Fig. [^a)). Since one QPSK symbol 
has two bits, which matches the number of output bits in 
one state transition, the trellis diagram in Fig. [^a) can be 
directly used by the QPSK symbol-level Viterbi decoder. 

However, each 16-QAM symbol contains four bits and 
the soft information of one 16-QAM symbol is related to 
two adjacent state transition outputs. To utilize the 16-QAM 
symbol information, we have to modify the trellis diagram 
of standard 1/2 convolutional codes. The modifications are 
shown in [^b). In particular, we merge two adjacent state 
transitions (of QPSK) into one state transition. For example, 
the two consecutive branches “0/00” and “0/10” in Fig.[^a) 
are merged to a new branch “00/0010” in p^b). Note that 
the number of states in the new trellis diagram remains to be 
the same as the original trellis diagram, and each state has 
four outgoing branches (namely, “00”, “01”, “10” and “11”), 
and the total number of stages in the new trellis diagram is 
half that of the original trellis diagram. 

The BER performance of the 16-QAM symbol-level PNC 
and MUD decoders are plotted in Fig. [^c). We can see that 
when A01 = A02 = 0, the 16-QAM symbol-level XOR- 
CD decoder improves the BER performance significantly, 
and the symbol-level MUD-CD decoder improves very little. 
In general, the symbol-level PNC decoder improves more 
than the symbol-level MUD decoder, relative to their bit- 
level counterparts, respectively (also see the experimental 
results in Section |VII-B2| ). For simplicity, we use QPSK 
as an example to explain this phenomenon in Appendix 
When A(/)i = 7r/2 and A (/)2 = 0, the symbol-level MUD- 
CD decoder also outperforms its bit-level counterpart. This 
suggests that the combining of MIMO (to increase degrees 
of freedom) and symbol-level decoding (to avoid information 
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logPr(xA 0 S = 3 + ?>j\yRl,yR2) OC max {-\yRl - hAlXA - hBlXB\^ - \yR2 - hA2XA - hB2XB\‘^} 

{xA,XB)^Xxj^^^=3+3j 

OC min {dl+ 3 j(yRi) + dl^3j(yR2)}- (17) 

\X A^x b) ^Xx 


□ 


States in Viterbi Decoder 


I/O Trellis Branches: 

Input Bit(s) / Output Bits 



Fig. 13: Trellis diagrams for symbol-level Viterbi decoders using (a) QPSK; (b) 16-QAM. Every two adjacent state transitions 
in (a) are merged into one state transition in (b). 


loss) is a practical solution to tackle the subtleties in 16-QAM 
NCMA systems. 

In the next section, we will evaluate MIMO-NCMA 
decoders’ performances through experiments on software- 
defined radio. The experimental settings differ from the above 
simulation settings in two major ways: (1) in a real system, 
A01 and A02 may not be exactly 0, 7r/4 or 7r/2; (2) more 
importantly, in a broadband system such as OFDM, A0i 
and A 02 may vary across different subcarriers (i.e., not 
all samples within a packet incur the same relative phase 
offsets). 

In other words, the simulations only serve to highlight 
certain points, but do not reflect what actually happen in 
practice. The experimental study (e.g., PHY-layer decoding 
statistics and packet decoding ratios) serves to fill the gap 
not covered by the simulation study and the merit of MIMO- 
NCMA will need to be validated in a real wireless commu¬ 
nications system. 

VII. Experimental Results 

For performance evaluation, we implemented MIMO- 
NCMA on software-defined radio. Section [VII-A| presents the 
implementation details and experimental setup and Section 
presents the experimental results. 


A. Implementation Details and Experiment Setup 

The MIMO-NCMA system was built on the USRP hard¬ 
ware (351 and the GNU Radio software with the UHD driver. 
We extended the single-antenna BPSK NCMA system in Q 
as follows: 

a) We added one more antenna at the AP and changed the 
SISO system of Q so that the AP can receive data from 
the two antennas. The end nodes still use one antenna. 


b) We modified the transceiver design so as to support 
QPSK and 16-QAM in addition to BPSK. 

c) We realized the QPSK and 16-QAM XOR-CD and 
MUD-CD decoders based on the reduced-constellation 
demodulation principle, incorporating symbol-level de¬ 
coding. 

For experimentation, we deployed three sets of USRP 
N210s with SBX daughterboards. Each MIMO-NCMA end 
node is one USRP connected to a PC through an Ethernet 
cable, and the MIMO-NCMA AP has two USRPs connected 
through one MIMO cable to behave like one node with two 
antennas. Eor the uplink channel, the AP sends beacon frames 
to trigger node As and node B’s simultaneous transmissions. 
Our experiments were carried out at 2.585GHz center fre¬ 
quency with 5MHz bandwidth. 

To benchmark MIMO-NCMA, we consider three systems: 

1) Single-antenna NCMA system (Single-NCMA) 

This system is based on the previous single-antenna 
NCMA (Tj. In this system, all nodes have only one 
antenna each. In this system, all nodes have only 
one antenna each.We extend the system in 0 to 
support QPSK modulation in addition to BPSK. Both 
MUD decoder and PNC decoder are used. PHY-layer 
bridging and MAC-layer bridging are performed in the 
decoding process to increase the system throughput. 

2) Distributed MIMO System (MIMO-MUD) 

This is a distributed MIMO-MUD system, where the 
receiver at the AP has two antennas and the transmitters 
at the two end nodes have only one antenna each. 
Conventional hard-input-hard-output ZE (zero-forcing) 
and MMSE (minimum mean square error) decoders are 
adopted for MUD decoding. 

3) MIMO NCMA system (MIMO-NCMA) 
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BPSK Single-antenna NCMA (Benchmark) QPSK Single-antenna NCMA (Benchmark) QPSK MIMO-MUD MMSE (Benchmark) 



SNR (dB) SNR (dB) SNR (dB) 


(a) 


(b) 


(C) 


Fig. 14: PHY-layer packet statistics versus SNR for different benchmark decoding schemes with balanced received powers 
of two end nodes: (a) BPSK Single-NCMA; (b) QPSK Single-NCMA; (c) QPSK MIMO-MUD (MMSE). 



Fig. 15: MIMO-NCMA PHY-layer packet statistics compar¬ 
ison with the bit-level decoder using QPSK. 


This is the high-order NCMA system proposed in 
this paper. Both XOR-CD and MUD-CD decoders are 
adopted. PHY-layer and MAC-layer bridgings are used 


B. Experimental Results 


We study both PHY-layer and MAC-layer performances: 
we first consider the PHY-layer packet decoding statistics 
of bit-level PNC and MUD decoders, and then we compare 
the decoding performance of the bit-level and symbol-level 
MIMO-NCMA decoder. After that, we evaluate the MAC- 
layer throughput. 

1) Bit-level Decoding Performance: We first study the bit- 
level decoding performance. We collected the QPSK PHY- 
layer decoding statistics of Single-NCMA, MIMO-MUD and 
MIMO-NCMA, and present the results in Fig. and Fig. 
p~5] (all systems use bit-level decoding). There are eight 
possible decoding outcomes in each time slot (see Section 
|III-B| ) when PNC and MUD decoders are used jointly in 
Single-NCMA and MIMO-NCMA systems. We group some 
outcomes together as follows: 


• NONE = (iv)(b) (no packet decoded). 

• X = (iv)(a) (only XOR packet decoded). 

• A I B = (ii)(b) (iii)(b) (either only packet A or only 
packet B decoded). 


• AX I BX = (ii)(a) (iii)(a) (XOR packet plus either 
packet A or packet B decoded). 

• AB = (i)(b) (both packets A and B decoded; XOR packet 
not decoded). 

• ABX = (i)(a) (both packets A and B decoded; XOR 
packet decoded). 


We also plot the packet decoding ratios of PNC and MUD 
decoders versus SNR with (a) QPSK and (b) 16-QAM in Fig. 
[T^ including bit-level and symbol-level decoding, and let us 
focus on the bit-level decoding in this subsection (we define 
the PNC packet decoding ratio as the number of decoded 
PNC packets divided by the total number of time slots; the 
MUD packet decoding ratio is the number of decoded native 
packets divided by the total number of transmit packets from 
nodes A and B). 


We performed controlled experiments for different re¬ 
ceived SNRs, and the received powers of signals from nodes 
A and B at the AP were adjusted to be approximately 
balanced (we remark that the powers of each pair could be 
slightly different due to channel fading, and the SNR pre¬ 
sented here is the average SNR of all the received packets). 
We calculated the SNR using the method in p^ , and varied 
the SNR values from 6.5 to 9 dB when the AP has single 
antenna. When the AP has two antennas, for QPSK, we 
varied the SNR from 9 to 11.5 dB since the effective receive 
power using the two antennas is almost doubled compared 
with single antenna case pQ| . For 16-QAM, we varied the 
SNR from 18 to 23 dB. For each SNR value, the AP sent 
1,000 beacon frames to trigger simultaneous transmissions 
of two end nodes and each uplink packet has a 400-byte 
payload. 


Observation 1 : Single-NCMA fails to support QPSK 

Sections [IV-A| and [l V-B | have discussed the potential phase 
penalty associated with PNC and MUD decoders when 
QPSK is adopted in a single-antenna NCMA system. Our 
experimental results corroborate the theoretical and simula¬ 
tion analysis. The PHY-layer decoding statistics in Fig.p^b) 
show that both PNC and MUD decoders cannot work well. 
Even at SNR=9dB (the working region for a point-to-point 
QPSK WLAN system p^), there are almost no decoded 
packets. 


Observation 2 : MIMO-NCMA works well for QPSK 
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Fig. 16: Packet Decoding Ratio Comparison by using Bit-level and Symbol-level NCMA decoders with (a) QPSK and (b) 
16-QAM. 


From Fig. we can see that the number of decoded 
packets for both bit-level PNC and MUD decoders increase 
drastically for MIMO-NCMA, compared with Single-NCMA 
of Fig. [Mtb). At 9dB, around 70% packets can be decoded 
correctly (either single packet or two packets), and at 11.5dB, 
the PER can be as low as less than 5%. The performance of 
both PNC and MUD decoders improves with the additional 
antenna. Furthermore, we can see that the MUD decoder 
based on reduced-constellation principle has a better PER 
performance than the conventional MMSE decodeij^ for 
QPSK (e.g., sum up the decoding outcomes ‘ABX” and 


‘’AB” in Fig. 15 and then compare with ‘AB” in Fig.p^c)), 
which is consistent with the BPSK results in ©■ 
Observation 3 : MIMO-NCMA with Bit-level decoding 
does not perform well for 16-QAM 

In Fig. [T^a), we can see that the QPSK bit-level MUD 
decoder has a packet decoding ratio over 80% at SNR of 
11.5dB. However, the bit-level decoder does not perform 
well when 16-QAM is adopted. Fig. p^b) shows that the 
packet decoding ratios for both 16-QAM bit-level PNC and 
MUD decoders are below 50% at 20dB. We next describe 
results showing that symbol-level decoding can improve the 
performance of 16-QAM PNC and MUD decoders. 

2) Symbol-level Decoding Performance: As discussed in 
Section |VlJ bit-level PNC and MUD demodulators lead to 
information loss when high-order modulations are used. In 
the following, we compare the performances of bit-level and 
symbol-level MIMO-NCMA decoders. From Fig. we 
can see that the symbol-level PNC decoder improves more 
than the symbol-level MUD decoder, relative to their bit- 
level counterparts, respectively. An intuitive explanation for 
the different improvements between symbol-level PNC and 
MUD decoders are referred to Appendix 

For QPSK MIMO-NCMA, bit-level decoding is already 
good and symbol-level decoding does not bring much im¬ 
provement. The reason is that the bit-level MUD decoder 
is already very good in decoding both packets A and B in 
most time slots (e.g., in Fig.f^ the success rate of decoding 


both packets A and B is over 70% at lldB). Fig. p^a) 
also shows that, for QPSK, symbol-level MUD decoder has 
little improvement over bit-level MUD decoder. Furthermore, 
because the MUD decoder can already obtain both packets 
A and B in most time slots, XOR packets obtained by 
the PNC decoder do not give much further improvement. 
Although symbol-level PNC decoder can obtain more XOR 
packets than bit-leve PNC decoder, the extra XOR packets do 
not help when both packets A and B are already available. 
We will see in Section |VII-B3| soon that QPSK MAC-layer 
throughput for the symbol-level decoder improves very little. 
Observation 4 : MIMO-NCMA with Symbol-level decod¬ 
ing improves 16-QAM performance significantly 

Unlike with QPSK, we find that with 16-QAM, the 
symbol-level decoders of both PNC and MUD outperform 
their bit-level counterparts substantially. From Fig. p^b) we 
can see that at SNR of 20dB, the packet decoding ratios for 
symbol-level PNC and MUD decoders increase by 20% and 
12% over their corresponding bit-level decoders. This large 
improvement is attributed to the bit-level decoders losing too 
much information when mapping one 16-QAM symbol into 
four independent BPSK symbols. 

Observation 5 : Symbol-level NCMA decoder supports 
real-time operation 

In view of our final goal to design real-time NCMA 
decoders, a natural question here is that whether this per¬ 
formance gain is accompanied by increased decoding com¬ 
plexity. Next, we evaluate the real-time decoding speeds by 
measuring the processing time for the two main components 
in NCMA decoders: (1) the channel decoder, and (2) the 
demodulator. In Appendix we show that the complexities 
of the symbol-level and bit-level Viterbi decoders are the 
same for QPSK and 16-QAM modulations. But for 64- 
QAM (or even higher-order modulations), the symbol-level 
Viterbi decoder has a higher complexit}j^ We implemented 
the symbol-level Viterbi decoder using GNU Radio gr-trellis 
library p6| . The measured decoding time confirms that the 
symbol-level and bit-level GNU Radio decoders are com- 


^Fig. [T^c) shows the PHY-layer statistics of the MMSE decoder. Al¬ 
though having two antennas at the AP improves the PER performance of 
single-antenna NCMA in Eig. |141b), the performance of MMSE is still 
subpar (i.e., still less than 50% packets are decoded correctly). 


^Eor the 64-QAM symbol-level Viterbi decoder, if a rate-1/2 convolutional 
code is used, we merge every three adjacent state transitions of the trellis 
into one state transition. Every state in the merged trellis has eight output 
branches. 
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Fig. 17: MAC-layer performance: Throughput compari- Fig. 18: MAC-layer performances of MIMO-NCMA with 16- 
son of different schemes under different SNRs with RS QAM: Throughput comparison of different schemes under dif- 
code’s constraint length La = 1.51/^ = 24. ferent SNRs with RS code’s constraint length La = 1.51/^ = 24. 


parable in speed (e.g., around 2ms), which corroborates the 
theoretical complexity analysis (see Appendix [D| for details). 

For demodulators, we can see from Appendix that 
the processing time for QPSK MIMO-NCMA is small 
and comparable to the previous BPSK single-antenna case. 
However, the demodulation time for 16-QAM increases 
significantly due to the increase of constellation size, as 
discussed in Section |VI-C[ Fortunately, with our proposed 
reduced complexity demodulators in Appendices and 
the demodulation time is greatly reduced after simplification 
(e.g., see Appendix for the reduced demodulation times 
and Appendix for BER performance comparisons). 

3) MAC-layer Throughput Performance: We now evaluate 
the overall NCMA throughput performance at the MAC 
layer. In NCMA, the PHY layer could decode one or two 
independent packets in one time slot (we treat the cases of 
ABX, AX, BX, and AB as having two independent packets, 
and the cases of A, B, and X as having one packet). For 
benchmarking, we first derive a theoretical upper bound for 
the overall MAC-layer normalized throughput imposed by 
the PHY-layer received data. The upper bound of NCMA is 
given by 

Upper Bound = 2 x {Pr{ABX} + Pr{AX\BX} 

+ Ft{AB}) + 1 X (Pt{A\B} Pr{X}). (18) 

We remark that, for the same normalized throughput, the 
absolute throughputs for QPSK and 16-QAM are twice and 
four times that of BPSK, respectively. The upper bound in 
( p^ and experimental results below should be interpreted in 
this light. 

We examine the MAC-layer performance by employing 
trace-driven simulations using the PHY-layer statistics ob¬ 
tained in Fig. M and Fig. We first can obtain the 


where Na (Xb) is the number of messages that node A (B) 
have been recovered, NBeacon is the number of beacons and 
La (Lb) is the number of packets the AP needs to decode 

Fig. plots the MAC-layer normalized throughputs of 
different schemes using QPSK. In Fig. QPSK MIMO- 
NCMA s achievable throughput almost coincides with the 


theoretical upper bound (we show in Section [VII-B2| that both 
PNC and MUD symbol-level decoders work better than their 
corresponding bit-level decoders, so here we only present the 
upper bound of NCMA using symbol-level decoders, namely 
the Upper Bound (Symbol) in Fig.[T^and Fig.[^. From Fig. 

we can see that MIMO-NCMA works well with QPSK, 
and double the throughput of BPSK NCMA. We also include 
conventional MMSE and ZF decoders as our benchmarks 
in Fig.[^ MIMO-NCMA has around 80^100% throughput 
improvement over them for all SNRs. 


Note that in Fig. the QPSK MAC-layer throughput for 
the symbol-level decoder improves very little compared with 
the bit-level decoder. That is, the joint use of MIMO and 
bit-level decoding has already provided good performance 
for QPSK NCMA. 

In contrast, the 16-QAM MAC-layer throughput for the 
symbol-level decoder improves more significantly, as shown 
in Fig. 18 The MAC-layer throughput result is consistent 


probabilities of all events (i.e., ABX, AB, AX|BX, A|B 
and X). Then, we generate traces based on the PHY-layer 
statistics to drive our MAC-layer simulations. 

The normalized throughput for NCMA systems is defined 
as 


Th = 


La X Xa + Lb x Nb 

XBeacon 


(19) 


with the PHY layer decoding results, i.e., both 16-QAM 
symbol-level PNC and MUD decoders outperform their cor¬ 
responding bit-level decoders significantly (see Fig. p^b)). 
That is, for 16-QAM, both MIMO and symbol-level decoding 
are needed for good performance. 

We also find that, as expected, 16-QAM requires higher 
SNRs than QPSK (e.g., to achieve a normalized through¬ 
put of one packet per time slot, 16-QAM requires 18dB 
while QPSK only requires 9dB). The saturated normalized 
throughput for 16-QAM is saturated at around 1.5 packets 
per time slot, which is lower than 1.8 packets for QPSK. 

^In NCMA, the M AC-layer RS code’s constraint length parameter 
L (see Section can be different for different nodes. We choose 

La = I.^Lb = 24 based on our prior experimental results: the detailed 
explanation and justification for using asymmetric La and Lb can be found 

in 0. 
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This is because in 16-QAM, mapping overlapped constella¬ 
tion points (i.e., different symbol pairs) into different XOR 
symbols — an ambiguity problem — are inevitable due to 
the dense constellation map, and the number of constellation 
points in each cluster may be much larger than that in QPSK. 
Fortunately, with symbol-level PHY-layer decoding, the 16- 
QAM MIMO-NCMA can achieve 3.5 times the throughput 
of the previous BPSK NCMA system. 

VIII. Conclusions 

We have developed the first NCMA system that operates 
on high-order modulations beyond BPSK. In particular, we 
have demonstrated the feasibility of NCMA operated with 
QPSK and 16-QAM modulations. Moving from BPSK to 
these high-order modulations presents a number of chal¬ 
lenges, among which the phase offset penalty is the most 
critical one. To tackle the phase penalty problem while retain¬ 
ing decoding simplicity (we use only non-iterative decoding), 
we put forth the idea of using a combination of two antennas 
and symbol-level decoding at the AP. We refer to our high- 
order NCMA system as MIMO-NCMA. Experiments on 
our software defined radio prototype show that at SNR = 
lOdB, our MIMO-NCMA system with QPSK modulation 
can double the throughput of the previous BPSK NCMA 
system. At SNR=20dB, the throughput of MIMO-NCMA 
with 16-QAM is 3.5 times that of BPSK NCMA. Overall, our 
results indicate that MIMO-NCMA is a promising technique 
to boost the throughput of NCMA at medium to high SNRs. 

Appendix A 

Reduced Complexity MUD Demodulator 

This appendix presents the reduced complexity NCMA 
demodulator (using 16-QAM as an example), and focuses 
on the symbol-level demodulator. In particular, we show that 
for MUD, the constellation of the received signal can be 
reshaped into a standard 16-QAM constellation after 
equalization on one user, and doing so reduces demodulation 
complexity. We further demonstrate that the BER of the 
reduced complexity MUD decoder is exactly the same as that 
of the exhaustive-search MUD decoder studied in Section Ivll 
In Appendix we adapt the reduced complexity principle 
of MUD for PNC. 

Without loss of generality, let us focus on the demodu¬ 
lation of node As information. To obtain the soft informa¬ 
tion of xa, we need to find 16 representative constellation 
points associated with different xa out of the 256 possible 
input symbol pairs {xa^xb) based on the received samples 
{Vri, yR 2 ) obtained in To calculate the soft information 
of a particular realization of xa, say xa, 0 can be expressed 
as 


ym = hAiXA + hsiXB + 

yR2 = hA2XA + hB2XB + '^ 2 , ( 20 ) 


16-QAM PNC and MUD Decoder BER Performance 



Fig. 19: Experimental BER performance for 16-QAM MUD 
and PNC decoder with reduced complexity demodulators 
and original demodulators. The reduced-complexity MUD 
decoder has the same BER performance, and the reduced- 
complexity PNC decoder has less than IdB performance loss. 
Sa (Sb) is the set of selected points to represent xa (xb), 
and I • I max denotes the maximum size of a set. 


ratio combining (MRC) p0| can be adopted. We have 

^BiyRl + ~ + ^B2^A2)^A 

-V-^ 

yMRC 

= {\hBl? + \hB2?) Xb + + h*s2W2, (21) 

^ ^ ^ ^ ^ > 

flMRC yjMRC 

where * denotes the complex conjugate operation. Through 
equalization, i.e., dividing both sides of by , we have 

MRC MRC 

j^wRc = Xb A- ^MRc • The estimate of xb can therefore be 
demodulated using the fixed 16-QAM grid decision regions 
with standard approaches. The soft information of xa (e.g., 
the representative Euclidean distance for the inputs of the 
Viterbi channel decoder) can be then expressed as 

4^ = IVRl - f^AlXA - -b \yR2 “ hA2XA “ 

( 22 ) 


Instead of finding the smallest value among 16 points through 
distance computation, we can simply find the nearest xb 
based on the fixed 16-QAM decision regions. Similarly, 
to obtain the soft information of xb, we can find one 
constellation point (xa^xb) for each xb = and then 
compute the soft information 
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the BER 


-XB- I” Fig- 

performance based on our experimental data shows that the 
SIC-based reduced complexity MUD decoder has exactly the 
same BER performance as that of the original exhaustive- 
search one, but with much less demodulation time (e.g., 
around 10% of the exhaustive search, see Table ||n| in Section 
|VII-B3| ). 


Appendix B 

Reduced Complexity PNC Demodulator 


Since we have two equations with one unknown variable Eor the MUD demodulator, we can make use of SIC and 
(which can take any value of a 16-QAM symbol), maximum MRC to reduce the demodulation complexity. However, for 
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an optimal PNC demodulator, the decision on xa^b has 
to be made based on a joint symbol pair (xa^xb), rather 
than performing MUD marginalization first and then XOR 
0. As a result, the optimal PNC demodulation region is 
irregular since we cannot perform equalization to get rid of 
two different channel gains of nodes A and B simultaneously. 
In this appendix, we try to make a balance between perfor¬ 
mance and complexity by extending the MUD complexity 
reducing scheme to approach the performance of an optimal 
PNC demodulator. We remark that the proposed reduced 
complexity PNC demodulator will depredate to the MUD 
demodulator presented in Appendix if we only consider 
the 16 selected constellation points for each of xa and 
xb in MUD demodulation; on the other hand, the reduced 
complexity PNC demodulator becomes the exhaustive-search 
optimal PNC demodulator if we consider all the 256 constel¬ 
lation points. In the following, we use the MUD complexity 
reduction scheme as an example to explain the reduced- 
complexity principle for PNC demodulator. 

Let Sa and Sb denote the sets of selected constellation 
points that represent xa and in the reduced complexity 
MUD demodulator, respectively (e.g., we select one nearest 
point to represent each xa = xa and xb = xb, such that 
both Sa and Sb contain 16 constellation points). We find 
representatives of xa^b from Sa^ Sb (the union of sets 
Sa and Sb) since each constellation point will be mapped 
to its corresponding XOR symbol xa^b- Suppose that we 
want to find the representative of xa^b = We search 

for all possible (xa^xb) that map to xa^b in U 
and select the most likely one with the smallest Euclidean 
distance. We refer to this PNC demodulator as nearest-point 
demodulator. 

Fig. shows that with the nearest-point demodulator 
(i.e., the green curve), the reduced complexity PNC decoder 
has around 2dB BER performance loss compared with the 
optimal PNC decoder (i.e., the black curve). The reason for 
the BER performance degradation is that we only consider 
a small subset of the total 256 constellation points (i.e., the 
maximum size of 5'^ U 5 'b is 32), and it is difficult to find 
all the 16 XOR representatives for most of the time. 

To narrow down the performance gap, we need to con¬ 
sider more constellation points. For example, we can select 
more than one nearest point to represent each xa = xa 
(xb = Xb)- Let us consider four nearest points, which 
follows the sphere decoding principle | [30| . The size of Sa 
or Sb now becomes 64, and the maximum size of Sa^ Sb 
is 128 (the union set is smaller than 128 if there are same 
pairs in Sa and Sb)- The resultant BER performance can 
be improved (i.e., the red curve in Fig. p^ , which almost 
coincides with the original exhaustive-search PNC decoder. 
For PNC decoder, there exists a fundamental demodulation 
complexity and BER performance tradeoff. 

Appendix C 

Performance Improvement Comparison between 
Symbol-level PNC and MUD Decoders 

In the simulation and experimental results, we find that 
symbol-level decoding brings substantial improvement over 


bit-level decoding for the PNC decoder. By contrast, symbol- 
level decoding brings smaller improvement over bit-level 
decoding for the MUD decoder. In this appendix, we give an 
intuitive explanation for this phenomenon using QPSK as an 
example (the analysis for 16-QAM follows similar methods). 

To understand the performance improvement of the 
symbol-level PNC decoder, let us look at the four constella¬ 
tion points overlapping at the origin of Fig. |7] The symbol- 
level demodulator maps (demodulates) the four overlapping 
constellation points (1+j, — 1+j), (—1—j, 1—j), (1—j, 1+j) 
and ( —1 + j, — 1 — j) to two possible QPSK XOR symbols 
—1 + j and I — j (i.e., we get two possible XORed symbol 
pairs: (-1, 1) and (1, -1)). And these two QPSK XOR symbols 
should have the same probability as the soft information for 
the XOR symbol, considering all the four possible input 
pairs are overlapped. This is useful information for the 
symbol-level decoder because the decoder can rule out two 
other XOR combinations: (1, 1) and (-1, -1). The PNC bit- 
level demodulator, on the other hand, will map the four 
constellation points to four XOR pairs: (-1,-1), (-1, 1), (1, -1) 
and (1, 1). Hence, no useful information about the input bits 
can be obtained through the bit-level demodulation process 
(since each XOR bit is equally likely to be 1 or -1). In this 
example, the number of overlapped constellation points can 
be reduced to a smaller number of XOR symbols through 
PNC mapping, and the symbol-level PNC demodulator can 
avoid huge information loss. Therefore, we observe a large 
improvement of the symbol-level QPSK PNC decoder over 
the bit-level one (e.g., see Fig. p^a)). 

However, with the symbol-level MUD demodulator, the 
number of possible demodulated MUD symbols always re¬ 
mains the same as the number of the overlapped constellation 
points. Therefore, the symbol-level demodulator may still 
have the same results as the bit-level one. To see an intuitive 
example, let us use the four overlapped constellation points 
at the origin again. There are four possible symbols for 
node A (namely, 1 + j, — 1 — j, I — j and —1 — j) with 
equal probability when we marginalize user B’s information. 
Thus, the symbol-level MUD decoder also has no useful 
information after demodulation, and the improvement of 
the symbol-level MUD decoder is not as significant as the 
PNC decoder. For QPSK, the symbol-level MUD decoder 
only improves the performance of bit-level MUD slightly 
(e.g., see Fig.[^a), and the small improvement comes from 
overlapping constellation points with fewer potential input 
combination possibilities). 

Appendix D 

Decoding Time Measurements for PHY-layer 
Decoders 

In the appendix, we present decoding time measurements 
for the PHY-layer MIMO-NCMA decoders, including the 
two main components: (1) the channel decoder, and (2) the 
demodulator. 

We perform experiments to evaluate the real-time decoding 
speeds of MIMO-NCMA PHY-layer decoders. The imple¬ 
mentation of the symbol-level Viterbi decoder was developed 
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TABLE II: Complexity analysis and decoding times of bit-level and symbol-level channel decoders: (a) Theoretical 
complexity with M-QAM, where N is the number of source bits, K is the number of states and r is the coding rate (K=64, 
r=l/2 in this paper). For M-QAM’s trellis in symbol-level decoder (M > 4), each state has output branches, and 


the number of stages reduces to A^/(rlog 2 M). (b) Channel 
decoders. 

(a) 


Decoder 

Modulation 

Bit-level 

Symbol-level 

BPSK 

2KN 

N. A. 

QPSK 

2KN 

2KN 

16-QAM 

2KN 

2KN 

64-QAM 

2KN 

8KN/3 

M-QAM 

2KN 

2^1082" ■ K ■ N/{rlog2M) 


decoding times (for 16-QAM) of SPIRAL and GNU Radio 


(b) 


Spiral Optimized 
(ms) 

Spiral Standard 
(ms) 

GNU Radio 
Bit-level (ms) 

GNU Radio 
Symbol-level (ms) 

0.48 

2.21 

2.18 

2.29 


TABLE III: Demodulation time comparison between bit-level and symbol-level NCMA demodulators using GNU Radio. 


-Demodulator 

Modulation ~~~~~~— 

Point-to-point (ms) 

Bit-level (ms) 

Symbol-level (ms) 

PNC 

MUD 

NCMA 

PNC 

MUD 

NCMA 

BPSK 

0.29 

0.72 

0.79 

0.81 

N.A. 

N.A. 

N.A. 

QPSK 

0.79 

1.72 

1.92 

2.23 

1.84 

1.96 

2.40 

16-QAM 

1.99 

113.54 

124.63 

137.09 

149.96 

160.71 

180.36 

16-QAM 

(reduced complexity) 

N.A. 

58.46 

16.49 

58.87 

60.72 

16.51 

60.92 


using GNU Radio gr-trellis library | [36| . The average pro¬ 
cessing times (the mean value of 10,000 packets), for channel 
decoder and demodulator, and shown in Table |IIJb) and Table 


III respectively. 


For channel decoders, we use 16-QAM modulation to 
test their speed performance. As a benchmark, in Table 
Hb), we compare the channel decoding times (excluding 
the demodulation time) with the bit-level SPIRAL Viterbi 
decoder that has been optimized at the machine code level 
| [37| . We can see that there is a big gap (i.e., 5 times) 
between the GNU Radio decoder and the SPIRAL optimized 
decoder. However, the standard SPIRAL decoder and GNU 
Radio are comparable in speed. Furthermore, the symbol- 
level and bit-level GNU Radio decoders have comparable 
speed. The experimental processing time corroborates the 
theoretical complexity analysis in Table |IJa). We believe that 
the symbol-level channel decoding latencies can be further 
reduced following similar approaches like SPIRAL. 

For demodulators, we can see from Table m that the 
processing time for QPSK MIMO-NCMA is small and com¬ 
parable to the previous BPSK single-antenna case. However, 
the demodulation time for 16-QAM increases significantly 
compared with that of QPSK, because both 16-QAM bit- 
level and symbol-level demodulators have to compute 256 
Euclidean distances, while QPSK only needs to compute 
16 Euclidean distances. Recall that, in Secion |VI-C| and 
Appendix we find that MUD demodulator’s complexity 
can be reduced without sacrificing any BER performance. 
The decoding times for our proposed reduce-complexity 
demodulators are also shown in Table [Till We can see that 
the demodulation time is greatly reduced after simplification. 
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